AITopics | navigation goal

Collaborating Authors

navigation goal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

32768f7faf1995026ef9821c696f3404-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-10-2026, 21:04:06 GMT

artificial intelligence, machine learning, trajectory, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Singapore (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology (0.70)
Transportation > Ground > Road (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Supplementary Material for NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking Daniel Dauner 1,2 Marcel Hallgarten 1, 5 Tianyu Li

Neural Information Processing SystemsOct-9-2025, 22:46:52 GMT

Next, we provide details on the NA VSIM implementation and the dataset used in our study.

evaluation, pdm, trajectory, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Singapore (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology (0.70)
Transportation > Ground > Road (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Let Humanoids Hike! Integrative Skill Development on Complex Trails

Lin, Kwan-Yee, Yu, Stella X.

arXiv.org Artificial IntelligenceMay-12-2025

Hiking on complex trails demands balance, agility, and adaptive decision-making over unpredictable terrain. Current humanoid research remains fragmented and inadequate for hiking: locomotion focuses on motor skills without long-term goals or situational awareness, while semantic navigation overlooks real-world embodiment and local terrain variability. We propose training humanoids to hike on complex trails, driving integrative skill development across visual perception, decision making, and motor execution. We develop a learning framework, LEGO-H, that enables a vision-equipped humanoid robot to hike complex trails autonomously. We introduce two technical innovations: 1) A temporal vision transformer variant - tailored into Hierarchical Reinforcement Learning framework - anticipates future local goals to guide movement, seamlessly integrating locomotion with goal-directed navigation. 2) Latent representations of joint movement patterns, combined with hierarchical metric learning - enhance Privileged Learning scheme - enable smooth policy transfer from privileged training to onboard execution. These components allow LEGO-H to handle diverse physical and environmental challenges without relying on predefined motion patterns. Experiments across varied simulated trails and robot morphologies highlight LEGO-H's versatility and robustness, positioning hiking as a compelling testbed for embodied autonomy and LEGO-H as a baseline for future humanoid development.

artificial intelligence, machine learning, robot, (17 more...)

arXiv.org Artificial Intelligence

2505.06218

Genre: Research Report (0.50)

Industry: Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces

Zhao, Baining, Fang, Jianjie, Dai, Zichao, Wang, Ziyou, Zha, Jirong, Zhang, Weichen, Gao, Chen, Wang, Yue, Cui, Jinqiang, Chen, Xinlei, Li, Yong

arXiv.org Artificial IntelligenceMar-8-2025

Large multimodal models exhibit remarkable intelligence, yet their embodied cognitive abilities during motion in open-ended urban 3D space remain to be explored. We introduce a benchmark to evaluate whether video-large language models (Video-LLMs) can naturally process continuous first-person visual observations like humans, enabling recall, perception, reasoning, and navigation. We have manually control drones to collect 3D embodied motion video data from real-world cities and simulated environments, resulting in 1.5k video clips. Then we design a pipeline to generate 5.2k multiple-choice questions. Evaluations of 17 widely-used Video-LLMs reveal current limitations in urban embodied cognition. Correlation analysis provides insight into the relationships between different tasks, showing that causal reasoning has a strong correlation with recall, perception, and navigation, while the abilities for counterfactual and associative reasoning exhibit lower correlation with other tasks. We also validate the potential for Sim-to-Real transfer in urban embodiment through fine-tuning.

drone, navigation goal, video, (15 more...)

arXiv.org Artificial Intelligence

2503.06157

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Health & Medicine (0.76)
Transportation > Infrastructure & Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.46)

Add feedback

Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration

Shen, Zhixuan, Luo, Haonan, Chen, Kexun, Lv, Fengmao, Li, Tianrui

arXiv.org Artificial IntelligenceDec-24-2024

Understanding how humans cooperatively utilize semantic knowledge to explore unfamiliar environments and decide on navigation directions is critical for house service multi-robot systems. Previous methods primarily focused on single-robot centralized planning strategies, which severely limited exploration efficiency. Recent research has considered decentralized planning strategies for multiple robots, assigning separate planning models to each robot, but these approaches often overlook communication costs. In this work, we propose Multimodal Chain-of-Thought Co-Navigation (MCoCoNav), a modular approach that utilizes multimodal Chain-of-Thought to plan collaborative semantic navigation for multiple robots. MCoCoNav combines visual perception with Vision Language Models (VLMs) to evaluate exploration value through probabilistic scoring, thus reducing time costs and achieving stable outputs. Additionally, a global semantic map is used as a communication bridge, minimizing communication overhead while integrating observational results. Guided by scores that reflect exploration trends, robots utilize this map to assess whether to explore new frontier points or revisit history nodes. Experiments on HM3D_v0.2 and MP3D demonstrate the effectiveness of our approach. Our code is available at https://github.com/FrankZxShen/MCoCoNav.git.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.18292

Country: Asia > China (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.54)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.46)

Add feedback

Unified Understanding of Environment, Task, and Human for Human-Robot Interaction in Real-World Environments

Yano, Yuga, Mizutani, Akinobu, Fukuda, Yukiya, Kanaoka, Daiju, Ono, Tomohiro, Tamukoh, Hakaru

arXiv.org Artificial IntelligenceDec-18-2024

To facilitate human--robot interaction (HRI) tasks in real-world scenarios, service robots must adapt to dynamic environments and understand the required tasks while effectively communicating with humans. To accomplish HRI in practice, we propose a novel indoor dynamic map, task understanding system, and response generation system. The indoor dynamic map optimizes robot behavior by managing an occupancy grid map and dynamic information, such as furniture and humans, in separate layers. The task understanding system targets tasks that require multiple actions, such as serving ordered items. Task representations that predefine the flow of necessary actions are applied to achieve highly accurate understanding. The response generation system is executed in parallel with task understanding to facilitate smooth HRI by informing humans of the subsequent actions of the robot. In this study, we focused on waiter duties in a restaurant setting as a representative application of HRI in a dynamic environment. We developed an HRI system that could perform tasks such as serving food and cleaning up while communicating with customers. In experiments conducted in a simulated restaurant environment, the proposed HRI system successfully communicated with customers and served ordered food with 90\% accuracy. In a questionnaire administered after the experiment, the HRI system of the robot received 4.2 points out of 5. These outcomes indicated the effectiveness of the proposed method and HRI system in executing waiter tasks in real-world environments.

information, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/RO-MAN60168.2024.10731235

2412.13726

Country:

Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
South America > Brazil (0.04)

Genre: Research Report > New Finding (0.35)

Industry:

Consumer Products & Services > Restaurants (1.00)
Education > Educational Setting (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.61)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following

Shi, Haochen, Sun, Zhiyuan, Yuan, Xingdi, Côté, Marc-Alexandre, Liu, Bang

arXiv.org Artificial IntelligenceMar-5-2024

Embodied Instruction Following (EIF) is a crucial task in embodied learning, requiring agents to interact with their environment through egocentric observations to fulfill natural language instructions. Recent advancements have seen a surge in employing large language models (LLMs) within a framework-centric approach to enhance performance in embodied learning tasks, including EIF. Despite these efforts, there exists a lack of a unified understanding regarding the impact of various components-ranging from visual perception to action execution-on task performance. To address this gap, we introduce OPEx, a comprehensive framework that delineates the core components essential for solving embodied learning tasks: Observer, Planner, and Executor. Through extensive evaluations, we provide a deep analysis of how each component influences EIF task performance. Furthermore, we innovate within this space by deploying a multi-agent dialogue strategy on a TextWorld counterpart, further enhancing task performance. Our findings reveal that LLM-centric design markedly improves EIF outcomes, identify visual perception and low-level action execution as critical bottlenecks, and demonstrate that augmenting LLMs with a multi-agent framework further elevates performance.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2403.03017

Country:

North America > United States (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

From Prediction to Planning With Goal Conditioned Lane Graph Traversals

Hallgarten, Marcel, Stoll, Martin, Zell, Andreas

arXiv.org Artificial IntelligenceAug-14-2023

The field of motion prediction for automated driving has seen tremendous progress recently, bearing ever-more mighty neural network architectures. Leveraging these powerful models bears great potential for the closely related planning task. In this letter we propose a novel goal-conditioning method and show its potential to transform a state-of-the-art prediction model into a goal-directed planner. Our key insight is that conditioning prediction on a navigation goal at the behaviour level outperforms other widely adopted methods, with the additional benefit of increased model interpretability. We train our model on a large open-source dataset and show promising performance in a comprehensive benchmark.

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2302.07753

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)

Genre: Research Report (0.82)

Industry:

Automobiles & Trucks (0.89)
Transportation > Ground > Road (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

AR Point&Click: An Interface for Setting Robot Navigation Goals

Gu, Morris, Croft, Elizabeth, Cosgun, Akansel

arXiv.org Artificial IntelligenceOct-22-2022

This paper considers the problem of designating navigation goal locations for interactive mobile robots. We investigate a point-andclick interface, implemented with an Augmented Reality (AR) headset. The cameras on the AR headset are used to detect natural pointing gestures performed by the user. The selected goal is visualized through the AR headset, allowing the users to adjust the goal location if desired. We conduct a user study in which participants set consecutive navigation goals for the robot using three different interfaces: AR Point&Click, Person Following and Tablet (birdeye map view). Results show that the proposed AR Point&Click interface improved the perceived accuracy, efficiency and reduced mental load compared to the baseline tablet interface, and it performed on-par to the Person Following method. These results show that the AR Point&Click is a feasible interaction model for setting navigation goals.

artificial intelligence, human computer interaction, interface, (14 more...)

arXiv.org Artificial Intelligence

2203.15219

Country:

Oceania > Australia (0.04)
North America > Canada > British Columbia > Vancouver Island > Capital Regional District > Victoria (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.69)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic Search

Trabucco, Brandon, Sigurdsson, Gunnar, Piramuthu, Robinson, Sukhatme, Gaurav S., Salakhutdinov, Ruslan

arXiv.org Artificial IntelligenceAug-9-2022

Physically rearranging objects is an important capability for embodied agents. Visual room rearrangement evaluates an agent's ability to rearrange objects in a room to a desired goal based solely on visual input. We propose a simple yet effective method for this problem: (1) search for and map which objects need to be rearranged, and (2) rearrange each object until the task is complete. Our approach consists of an off-the-shelf semantic segmentation model, voxel-based semantic map, and semantic search policy to efficiently find objects that need to be rearranged. On the AI2-THOR Rearrangement Challenge, our method improves on current state-of-the-art end-to-end reinforcement learning-based methods that learn visual rearrangement policies from 0.53% correct rearrangement to 16.56%, using only 2.7% as many samples from the environment.

agent, rearrangement, semantic map, (13 more...)

arXiv.org Artificial Intelligence

2206.13396

Country:

North America > United States > California (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre: Research Report > New Finding (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback